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^ ■ Abstract 

o 

In this paper we present a novel approach to the determination of fat tails in 
financial data by studying the information contained in the limit order book. In an 
^ order-driven market buyers and sellers may submit limit orders, which are executed 

if the price touches a pre-specified lower, respectively higher, limit-price. We show 
Tij" ' that, in equilibrium, the collection of all such orders - the limit order book - implies 

a volatility smile, similar to observations from option pricing in the Black-Scholes 
model. We also show how a jump-diffusion process can be explicitly inferred to 
account for the volatility smile. 

d . 

tin ; 1 Introduction 

The organization of a marketplace where buyers and sellers meet to exchange a well 
defined asset naturally lies at the heart of the price discovery process. Traditionally 
^ ■ this is done by specialist market makers with a mandate to match bid and ask quotes 
(yQ • from market participants. Such market-order driven ways of trading define the most 
. "impatient" form of interactions in the market place - orders are executed immediately 
! as soon as a counterparty is identified that matches the order even at the expense of 

O 



getting "filled" at a price that is suboptimal to the initiator of the trade. Such slippage is 
symptomatic of market-orders and can be viewed as the "price to pay" for the impatience 
to execute. The latter can be avoided by trading limit-orders instead of market-orders. 
Each limit order includes a price level and a quantity. A seller would specify a pre-defined 
^ ! execution price level that is typically set above the current market price, whereas a buyer 
I would like to purchase below the current market price at a limit-price of her choice. The 
important difference to market-orders is that limit-orders never get filled suboptimally, 
but may rather not get filled at all or filled only partially in some cases. Hence the market 
participant needs to exercise "patience" to see an order completed and to be rewarded 
with a premium to current levels. The higher the limit-order price level, the higher the 
potential rewards and the higher the patience required to see a trade completed in time. 

All limit orders are typically collected by the exchange in a limit order book (LOB) 
that can be accessed by all market participants. According to P. Jain |Jai03] . currently 
more than half of the world's markets are order-driven. 

This raises a sequence of interesting theoretical questions about LOBs such as how one 
should optimally position oneself for trading in a LOB? What is the information contained 
in a LOB and when is it in equilibrium? The recent public availability of LOB data has 
sparked extensive studies on the structure of LOBs. The work of J. Bouchaud et al. 
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|BMPn2] provides useful insights into the shape of the LOB from a statistical standpoint. 
The study conducted by A. Ranaldo |Ran04j sheds light on aspects of the behavior of 
market participants as represented by the change of their impatience preferences as a 
function of the bid-ask spread. Further, Z. Eisler et. al in |EKL07j give a detailed look 
at how the LOB behaves on different time scales. In |CST08] a stochastic model for 
the dynamics of the LOB in continuous time has been developed, allowing for simple 
calibration and explicit calculation of certain probabilities of interest. M. Bartolozzi 
|BarlOj proposes a multi- agent model for the dynamics of the LOB, with a particular 
focus on capturing key features of high-frequency trading. M. Avellaneda et al. |AS06j 
also propose a probabilistic framework for a utility optimizing agent in the context of 
high-frequency markets. A study conducted in Spanish equity markets by R. Pascual 
et. al |PV08] focuses on what pieces of information of the LOB is significant and finds 
that the information is concentrating around the best bid and ask orders, while orders 
further out in the LOB do not significantly contribute to the price formation process. 
Somewhat similarly, I. Rosu [RoslO] proposes a dynamic model for order-driven markets 
with asymmetric information. He argues that the price impact of market orders is more 
significant than the impact of limit orders by an order of magnitude. Furthermore, R. 
Cont et. al |CKS10j study the price impact of order book events and find a surprisingly 
simple linear dependence between price changes and an indicator they introduce, which 
measures the imbalances between the order flow on the buy and sell sides of the LOB. In 
|TKF09] a study of the dynamics of different indicators, such as the bid-ask imbalance, 
is conducted before and after large LOB events and finds significant dependencies. 

The objective of this paper is to study consequences in situations when a LOB is in 
equilibrium. Following arguments by Rosu |Ros08j . equilibrium occurs when, at any time, 
there exists an impatience rate, independent of (limit-order price) level, which "discounts" 
limit-sell orders at higher prices in favor of smaller once. Thus an impatience rate strikes 
a consistent compromise between higher prices on the one hand, and longer expected 
passage-times to fill on the other hand, throughout the LOB. We study the LOB data 
of the DAX future and show that the assumption of a geometric Brownian motion for 
the price dynamics implies a volatility smile which is reminiscent of the volatility smile 
observed in option markets. This allows us to conjecture and re-engineer non-trivial 
dynamics underlying the LOB. We show that the assumption of a double-exponential 
jump dynamics provides a satisfactory description of the data. Hence this work provides 
further new evidence for the occurrence of fat tails in financial data. In addition it provides 
a novel approach for the determination of jump parameters in finance. 

The paper is outlined as follows - first we formalise the general notion of an impatience 
rate - the quantifier of the trade-off between waiting longer but executing the order at a 
better price. We then test the assertion, that the impatience rate is level- independent, by 
assuming that the prices follow a geometric Brownian motion (GBM). We find that the 
GBM cannot account for a consistent impatience rate, and observe a volatility smile - if 
the impatience rate were to be level-independent then limit-orders at higher levels imply 
an ever increasing volatility. Considering the evidence we augment the price process by 
adding jumps. We conclude by providing evidence, that the double-exponential jump- 
diffusion (DEJD) price process admits an impatience rate independent of the limit-order 
level. 
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2 Impatience Rate 



In this work we aim to extract empirically testable pieces of information from the LOB, 
which are also pertinent to order-driven markets, and are as much as possible independent 
of the model at hand. In reviewing the literature on LOB modelling, a distinction between 
patient and impatient agents has turned out to be the common denominator of a much 
of the research. Specifically in his seminal paper on LOB dynamics, Rosu |Ros08j relies 
explicitly on a parameter r, the impatience rate, which acts as a discounting or penalizing 
factor for waiting longer for a better fill price. Rosu further proposes an equilibrium 
model, in which agents maximize their utility according to the following utility function: 

ft := E,[P, - r(r - t)], and - gt := E^-P, - r(r - t)] (1) 

where ft is the expected seller utility at time t, r is the time of the hmit order execution, Pr 
is the limit order price and r is the common impatience rate of sellers and buyers. Similarly 
—gt is the buyer's utility. Regardless of specific price dynamics assumptions and LOB 
model, a limit order far away from the current price will take longer to fill, so r weighs 
the benefit of a better fill price, which increases utility, by penalizing for waiting longer 
and decreasing utility. So the impatience rate should quantify this fundamental trade-off 
between waiting longer to execute at a better price and waiting less, but executing at a 
worse price. Consequently, if the LOB is in a state of equilibrium, and if capital markets 
are efficient, the impatience rate should be the same across limit-order levels. 

I. Rosu in |Ros08j shows that such an equilibrium exists in his theoretical framework, 
and empirical testing should provide evidence whether the LOB is efficient, or at least 
imply what the price process should look like, if the assumption of information efficiency 
were to hold. Despite playing such a central role in many works on limit-order markets, 
the properties of the impatience rate r have not been examined. 

The utility function as given in ([T]) is somewhat unfortunate. On the on hand, it 
depends on the absolute level of the limit order price, Pr, which leads to different results 
even in Rosu's model for markets, which are equivalent up to a price scaling constant. 
Also, this approach suppresses an important characterization of the limit order, namely 
its distance to the current best offer, or at least to the mid price. In ([1]) this is expressed 
only indirectly through the expected hitting time E[r]. At this point another drawback 
of this particular utility becomes apparent - the fact that E[r] = oo, if the asset price is 
modelled by an exponential Brownian motion without drift, or if the drift is in a direction 
away from the submitted limit orderj^ 

Since ([1]) depends on the absolute level of the price, and would generally be infinity for 
a Brownian motion price process we investigate a different utility function. We consider 
the currency value of the distance-to-fill, i.e. how much the best offer has to travel until 
it meets a trader's limit order, discounted by the impatience rate in a quite literal sense: 

Uoir) = \D\E[exp{-rTs+D)\So = S] (2) 

D G M is the distance from the limit order to the best offer 5*0 at time t = 0. Further 
Ts+D is the first hitting time of the price process St describing the evolution of the best 
offer, started in 5*0: 

Ts+D ■= inf{S't = S + D}. 



^For a discussion of how the GBM should be specified, so that the expected hitting time is finite cf. 
M. Yor et. al |JYC09j . 
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3 A Pure Diffusion Setting 

We first adopt a model in which the asset price {St)t>o is a geometric Brownian motion 
(GBM). Specifically let [Q, J= , {J^t)t>o , P) be a filtered space □, and {St)t>o be a stochastic 
process on this filtered space, whose dynamics are given by the stochastic differential 
equation (SDE) 

dSt = Stfidt + StcrdBt , Sq = S 

where 5, cr > 0, yU G M and {Bt)t>o is a standard Brownian motion, started in 0. The SDE 
has the exact solution H: 



St = S exp ( [ fi - Y ] t + (^Bt 



(3) 



We define the asset price log-returns over an interval At > by 

:= In 



and establish the following results: 



E[/(At)] = E 
V[/(At)] = E 
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Further we note that the Laplace transform of the first hitting time at level z G 

:= inf {Bt = z] 



of a standard Brownian motion with drift fi 
is given b}0 



Bf ■.= fit + Bt 



E[exp(— rf^)] = exp (^fiz — \z\\/2r + jl"^) (6) 

for some r > 0. Finally a simple transform is needed to obtain a closed formula for the 
expected utility: 



Ts+D = ini{St >S + D} = mi{St = S + D} 

t>0 t>0 



t>0 



inf 

i>0 



inf < 5* exp ( ( /i 



a 



t + Bt 



t + aBt 



S + D 



a 



a 



(7) 



f„ 5gM+ De]-S,oo[ 



^In the rest of this paper we will always assunie, that any stochastic process we introduce is adapted 
to this generic filtered space and its dynamics are given with respect to the physical probability measure 



^Cf. phHM) 

4Cf. [BS96] (page 223, 2.0. f). 
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Now Tz is the first hitting time of a standard Brownian motion with drift /t for the hitting 
level 2; G M. Notice that we allow 2; < 0, so that we can use the same result for hitting 
levels above the current price, as well as below the current price. In particular we can use 
the same formula for the bid and for the ask side of the book. Now we are in a position 
to compute explicitly for D G M \ {0}, r, S > 0: 



\D\E[eM-rrs+D)\S, = S] 
|D|E[exp(-rf,)] 

\D\ exp {jlz — \z\\/2r + (j?^ 



I D I exp 



v 



2r + 



In order to calibrate our model to market data, we need to estimate the drift and the 
volatility of the GBM. We do this by estimating the sample mean and standard deviation 
of the log-returns market mid-prices at equidistant time points, with a constant time 
interval of At. Recall that Mt, the mid price at time t, is simply the mid-point of the ask 
price at level zero Af^ and the bid price at level zero Bf"^: 



(0) 



B?\Af^>Q 



We do a rolling estimate for each data point i in equidistantly spaced LOB (i.e. ti — 
= A() using a standard point estimate of the sample mean, based on the last thirty 
observations prior to the current point: 



Similarly we estimate the standard deviation of the log-returns by: 



Si (At) := 



1 ^° / 

k=l ^ 



M 



i—k 



i-fc-l 



m,(At) 



(9) 



Notice that k starts at one, ensuring that our current estimate of the sample mean and 
standard deviation uses only past data. Recalling the expressions for the expected value 
(IH) and the variance dS]) of a GBM, we first substitute Si{MY for V[/(At)] in to obtain 
the simple rolling estimate: 

a2At = Si(At)2 (10) 
Plugging this expression in (jl]), and substituting mj(At) for E[/(At)], we estimate: 

,-!ii^lAt 



mi (At) = 
^/iAt = mi(At) + 



2 

Si (At) 2 



Now our model is thoroughly specified and we are in a position to calculate the expected 
utility at any point in time, given the value of the impatience rate at that point. Since, 
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however, the impatience rate is precisely what we would like to estimate, we need an 
additional assumption about the whole setting. Considering that the LOB is in an equi- 
librium when the expected utility of all participants on each side of the book is the same, 
so no one has an incentive to put in their order at a different levej^, we assume that the 
expected utility is constant over short periods of time, and that it evolves fairly smoothly 
in time, as the expectations of the market participant on each side of the book for the 
future direction of price movements changes through time. By starting at some reasonable 
value for the expected utility we can fit the impatience rate stepping through time, with 
the objective of keeping the utility stepwise constant achieving a smooth fit. 

A clearer specification of the algorithm impatience rate estimation by fitting the ex- 
pected utility to market data in a GBM framework is due. The inputs required are the 
following: 

• N - the number of data points in the reconstructed LOB with equidistant spacing 
of At. The results presented in this paper are for At = 30 seconds. 

• "steps" - indicates the number of data points, over which the expected utility is to 
be held constant, in order to minimize an error function. In this paper we present 
results for "steps" =2, in order to compare the results better to the DEJD model; 

• m - a vector of size N, containing the rolling estimates for the mean of the observed 
log-returns where the time interval between observations is At. Note that mj, the 
estimate at time tj, is based on the log-returns at thirty observations prior to tj, so 
no peeking in the future is allowed; 

• s - a vector of size A^, containing the rolling estimates for the standard deviation of 
the observed log-returns where the time interval between observations is At, where 
again estimates are based on a past data only; 

• f/o - an initial estimate for the expected utility; 

• ro - an initial estimate for the impatience rate; 

The outputs are two vectors U and r, each of size A^, containing the expected utility and 
the impatience rate through time. 

The particular choice of the error function in row 9 is a natural one - it collects 
the differences in utilities step by step, scaling each error by the mean of both adjacent 
utilities, ensuring the minimization algorithm would not just choose a large r to converge 
by essentially driving all utility down to zero. Particular care has to be taken when 
choosing e, 6 and other minimization algorithm break criteria in order to ensure timely 
convergence. Also one has to consider what At to choose in order to reduce the number 
of time steps of the immense data set the LOB offers per day (in our case about 450,000 
observations per day in the LOB organization of Figure [16] in Appendix C). Further 
when choosing "steps" , one faces the trade-off between smoothness of U for an increasing 
number of "steps" on the one hand, against potential convergence problems, as well as a 
more noisy r. 

^Cf. |Ros08j 
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Algorithm 1 Estimation of the impatience rate in a GBM setting 



Inputs: N, steps, m, s, Uq, Tq 
1: r = Tq 

2: for i=l:steps:N do 

3: while error(r) > e & Tolerance > (5 do 

4: rj := minr>.oi{error(r)} 

5: WHERE error(r) is 

6: for j=i:steps+l do 

7: calculate Uj according to ([HD using rrij, Sj, r 

8: end for 




10: end while 
11: end for 
Outputs: U, r 



4 Results For a Pure Diffusion 

We find, that in a GBM setting, the impatience rate r is not constant across levels, 
meaning that either the market is not in equilibrium, as contended in |Ros08j and dictated 
by the efficient markets hypothesis, or that the GBM framework is an inadequate cannot 
describe the LOB equilibrium. We further observe a volatility smile (see Figure |5]), which 
leads us to assume a jump- diffusion process for the asset price dynamics. 

All results presented here are based on the following parameters: At = 30 sec; "steps" 
= 2; To = 1 and Uq is calculated according to ([H]) for r = tq = 1. It should be noted, 
that we achieved very similar results for "steps=15" and for "steps=30", indicating the 
fitting algorithm is independent of the particular time-stepping procedure. The results 
are also the same for different starting tq and Uq, providing evidence that the problem is 
well-defined and fitting procedure is also well-posed. 

Our results are based on nearly three months of LOB data of the DAX future from 
June to August 2010. Analysis of the data revealed properties of the impatience rate and 
the expected utility which have been very consistent throughout trading days. We will 
illustrate them based on a representative day of our data set - 20 August, 2010 - and refer 
the reader to the appendix for descriptive statistics for each day. 

In Figured] we present the impatience rate r as a function of the risk-adjusted relative 
distance-to-fill 



for the sell side of the LOB. If r were to be constant, strains of fiat lines, each strain 
indicating a different market mode, should be observable. Instead, for different levels of 
expected utility, the impatience rate assumes a power law of the form 



for some a > 0. 

The relationship demonstrated in Figure [T] provides evidence, that the impatience 
rate is primarily a function of the estimated volatility and the distance-to-fill D. A first 
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(12) 



1 



(13) 



7 



14 



12 - • 



10 - 




z = Risk-Adjusted Relative Distance-To-Fill 



Figure 1: Impatience rate (r, y-axis) against risk-adjusted relative distance-to-fill {z, x- 
axis) for the sell side of the book. The fitting procedure reveals a power law for r, which 
is driven by both - the estimated volatility s and the absolute distance-to-fill D. (FDAX, 
August 20, 2010) 

conclusion is that in the GBM model the impatience rate is not constant across different 
levels. In particular it cannot extract solely the information content about the trade-off 
between a better fill price and a longer expected waiting time, but much rather it still 
contains information about the volatility and the distance-to-fill. Next, in Figure [2] we 
show a log-log plot which suppresses the power law relationship between r and z and 
reveals the different strains of this relationship for different levels of expected utility. The 
relative level of expected can be interpreted as indicative of prevailing market sentiment 
and therefore different levels of expected utility represent different market modes, and 
sharp changes in the absolute value of utility point to a sentiment shift. 

In figure [3] we present the expected utility compared to the mid-price through time. 
Changes in the direction of the expected utility, indicate numerical instability and can be 
interpreted as an indication, that a shift in sentiment of the respective market participants 
(here - the sellers), is taking place. A more detailed analysis showed that while indeed 
price direction and expected utility changes are concurrent, the latter is not a reliable 
predictor of the former. 

In Figure H] a comparison of the impatience rate and the mid-price is given. One 
of the objectives of this research is to conclude whether the impatience rate is somehow 
indicative of future price movement, or if it contains any other piece of useful information. 
The intuition is that on, per example, the sell side of the book, an increasing impatience 
rate r indicates, that sellers are becoming eager to get out of their holdings, so a price 
deterioration can be anticipated. 

While testing has revealed, that sharp changes in the mid-price are accompanied a 
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Figure 2: log^^Q of the impatience rate (r, y-axis) against log^Q of the risk-adjusted relative 
distance-to-fill {z, x-axis) for the sell side of the book. Notice the different linear (expo- 
nential on a linear scale) strains, which indicate the same power relationship between r 
and z for different levels of the expected utility. (FDAX, August 20, 2010) 



shift in the opposite direction of the sell-side impatience rate, the direction of r is not a 
reliable indicator of future price movements, since there are false outbreaks. Also price 
changes, which have not occurred abruptly enough, may not be captured by a significant 
change in r and thus missed. Notice also how the impatience tends to fluctuate, around 
a stable state, when the price is trending sideways. This is a particular consequence of 
the fact that the impatience rate is a function of the distance-to-fill - the fluctuations 
indicate that the impatience rate needs to assume a different value for new values of 
D, in order to keep the expected seller utility smooth. As a result regression analysis 
has showed no significant relationship between the estimated impatience rate and future 
mid-price returns. The results for the buy-side of the LOB are similar and a side-by-side 
comparison of sell and buy side results can be seen in the appendix. 



5 Volatility Smile 

The results reveal that the impatience rate is not constant across LOB levels and in this 
model framework cannot be used as a consistent quantifier of the trade-off between a 
better fill price a longer expected time-to-execution. Further analysis suggested that the 
impatience rate in a GBM model is a function of the distance-to-fill and of the estimated 
volatility. Intuitively the market participants, who place their orders further out in the 
LOB, imply a much higher volatility than the observed. It seems as if traders who place 
orders at higher levels in the LOB are betting on a sharp price change in the desired 
direction. In order to analyse this further, we take a look at the LOB implied volatility, 



9 



0.36 




6100 



6080 



6060 



05 
O 

6040 

■a 



6020 



6000 



5980 



X 10 



Figure 3: The expected seller utility U (left axis) and the mid-price (right axis) through 
time (x-axis) on a single trading day. Notice the initial drop from the initial estimate to 
a level which is reflecting the actual expected utility given the impatience rate r, and the 
subsequent smoothness of the evolution of the utility. Jumps indicate a change in the 
utility level and correspond to a different linear strain in the relationship between r and 
z, as demonstrated in the previous figure. (FDAX, August 20, 2010) 



which we define to be the volatility a for which 



UD{r) = UD{r,a) = c 



(14) 



for some positive constant c. As we have already established, the impatience rate depends 
on D, and on cr, so we would like to separate and measure the two effects. To this end 
we choose an arbitrary, but reasonable, constant c for the utility level and also fix the 
impatience r at some constant level. Both are held constant throughout a whole trading 
day, and we fit for a, to derive the implied volatility which keeps the impatience rate 
constant at r and the expected utility level UD{r,(T) constant at a level c for the entire 
trading day. What we discover is a volatility smile, being implied by the LOB, as is 
demonstrated in Figure O Essential this shows, that the GBM underweights limit orders 
at higher levels - the process' volatility is insufficient for it to reach these limits in due 
time, so the volatility has to be tuned up to account for the dynamic equilibrium, and for 
market efficiency to hold. 

The empirical work conclusively shows, that in a GBM setting, there is no unifying 
impatience rate across the different levels in the LOB. It has turned out that it is a function 
of both - the volatility term a of the stochastic process, and of |D| - the distance-to- fill. It 
is inversely related to the volatility, i.e. the higher the volatility, the lower the impatience 
rate, on either side of the book. It means that it becomes "easier" for the process to 
reach an order, which is at a higher level in the book, which is in turn necessary due the 
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Figure 4: The seller impatience rate r (left axis) and the mid-price (right axis) through 
time (x-axis) on a single trading day. Intuitively a rise in the seller impatience rate should 
precede or accompany a fall in the price. (FDAX, August 20, 2010) 



assumption that the expected utility is the same at all levels of the book, and evolves 
smoothly through time. 

In the same way, the impatience rate is a function of the distance-to-fill. It plays the 
role of a "volatility-compensator", keeping the expected utility U constant across levels 
for different values of |D| . The impatience rate needs to be very small for large \D\ in 
order to compensate for the decrease in utility, as r becomes too big. Since the volatility 
is fixed by empirical observations, and E[r] is knowij^ to grow exponentially with rising 
distance, it is a direct consequence of the GBM framework, that r follows an exponential 
decay law for an increasing \D\ and a falling a. 

The inverse power-law between the impatience rate and the observed volatility is 
especially clearly observable in the volatility smile. This surprising discovery highlights the 
fact that in order to achieve utility equilibrium one has to tune up the process' volatility, 
so that limit orders placed further away from the mid-price have a higher chance of 
execution. Intuitively one may argue, that market participants place limit orders further 
out in the LOB, because they anticipate a large block of orders being placed at once, so 
that lower level limit orders get immediately filled, leaving the agent's order as the best 
or even filling it. 

A critical argument for jump augmentation is based on the observed "volatility smile", 
a phenomenon well known from option pricingj^ There are already many models in options 
theory which account for the smile. Regarding the principle idea of the solution, there 
are two broad classes of approaches to incorporate this phenomenon in the price process. 



^Cf. [BS96,. 

''Cf. [ Der03] . |SP99| on more about the volatility smile in the context of option pricing. 
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Figure 5: The volatility smile, as implied by the LOB. A scatter plot of the implied 
volatility a (y-axis) and the relative distance-to-fill {S+D)/ S (x-axis) form what resembles 
a volatility smile as known from equity options pricing. The plot reveals a whole day's 
worth of data, a values to the left of one are derived from the bid-side of the LOB, and 
to the right - from the ask-side. This instance of the smile was derived for U = 0.2355 
and r = 0.5 (FDAX, August 20, 2010) 

One is to assume a stochastic process for the volatility term in front of the Brownian 
motion. The other is to add a jump process to the Brownian motion. While it might seem 
somewhat natural that the volatility itself should also be a process rather than a constant 
term, the dynamics that are usually used to model the evolution of the volatility through 
time are not necessarily intuitive]^ On the other hand, the idea, that price processes have 
jumps is considered characteristic of financial time serie^ and is readily observablj^. In 
combination with the intuition, that limit order traders position themselves at the outer 
levels in anticipation of a block-trade, or a sharp price movement, we are lead to extend 
the GBM to a jump-diffusion by incorporating a jump process, i.e. a process of the form: 



where Jt is a time-homogeneous compound Poisson process, whose jump sizes (Fi)ieN are 
a family of independently and identically distributed (iid) random variables!"] 



^Cf. [Bro09J 

^Cf. |Bro09] , [BDn2], [BD89] , [MFE05] 
i°Cf. |Bro09) . |MLMK08] . |AM10] 



dSt = fiStdt + crStdBt + StdJt, Jt = Y,Yi 



1=1 
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6 A Jump- Diffusion Setting 



Lead by the observation of a volatility smile and the intuition that far-off limit orders are 
underweighted, in the sense that in the geometric Brownian motion (GBM) framework 
high-level orders are too difficult to hit, we extend our asset price model to include jumps. 
In this way we will be able to account for the observed empirical phenomena and serve 
intuition. Due to the increased flexibility of the models we are also certain to obtain a 
better fit of a period-wise constant impatience rate to the market data. 

Since the success of the Black-Scholes-Merton formula for equity option valuation, 
which underlies a geometric Brownian motion, a number of jump- diffusion models have 
been proposed as an extension to the original model. We are going to use the Double 
Exponential Jump Diffusion (DEJD) model, as proposed by Kou in |Kou02j and exten- 
sively studied by Kou and Wang in |KW03j . Our choice is lead by our specific interest 
of the Laplace transform of the first passage time of the asset price. As is demonstrated 
in |KW03] ■ the model offers a closed-form solution (up to finding the zeros of a rational 
function) for the Laplace transform. 

The DEJD model of the asset mid-price {St)t>Q is specified by the stochastic differential 
equation 



6.St = Stfidt + StadBt + Std (^^{V, - 1) j 



(15) 



where Nt ~ Poi(At) is a Poisson process with intensity A, and 

Y := log(Vi) ~ dexp(p, q, r/i, r^a) (16) 

iid random variables, distributed according to the asymmetric double-exponential law 
with the density 

frit) = pr]iexp{-r]it)l^t>o} g?72 exp( -772^)1 {t<o} (17) 
r/i >2, r]2> 2, 
p, g>0, p + q = 1 

The model parameters can be understood as follows: at any time p is the probability of an 
upward jump, and q = 1—p is the probability of a downward jump. The mean jump-sizes 
are l/rji and 1/772 for an up-jump and a down-jump respectively. At each point in time 
only one jump can occur, and the occurrence of jumps is modelled by a homogeneous 
Poisson process with a constant intensity rate of A, meaning that the mean number of 
jumps up to time t is At. All driving process of the model price, N{t), Bt and (l^)igN are 
assumed to be independent, although in |Kou02j it is suggested that this assumption can 
be relaxed. For the purposes of this paper this possibility will not be further investigated. 

Theorem 6.1. The solution of the SDE f|T5|) is given by the process 

St = SexY>[y^^-\Y + oBtj n (18) 

Proof. See the appendix. ■ 
For convenience we adopt the following representation of Y: 

Y = Ui-^ - {I- U)C (19) 
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where U ~ Ber(p) is a Bernoulli random variable, indicating that an up-jump occurs with 
probability p, and ^~ are both exponential random variables with means 1/t]i and 1/772 
respectively, corresponding to the mean up-jump and mean down-jump sizes. All three 
random variables are assumed to be independent. 

Theorem 6.2. The asset mid-price model specified by f|T8l) and f|T9|) where Y = log(\4) 
has the following properties 

a) EY = ^- ^ andYY = pq(-^ + -^Y + (^ + ^] ; 
h) E[y]=g^+p_^, and 

T]i-2 ?72 + 2 V ^71 - 1 ^2 + 1/ 
cj The first two moments of the "Poisson product" Pt := Hi^i ^ 

• E[Pt] = exp(tA(E[\/] - 1)) anf^ 
. E[p2]=exp(a(E[\/2]_l)); 

E5t = ^exp(/it)E[Pt] and YSt = S^e^^^* (^e'^'*E[p2] _ E[Pt]' 

e) For l{At), the log-returns over a period At, we have: 
. E[/(At)] = - ^) At + A ( J - At and 

• V[/(At)] = a^At + AAt + \ 
Proof. See the appendix. 



Observing that Y[V] = 00 for 771 < 2, as shown in the proof of b), we are lead to 
impose this constraint when modelling the asset price. Since there is no intuitive reason 
to prefer a priori any price direction we make the same assumption for 772 in fll7p . It 
essentially means, that mean jump sizes cannot exceed 50%. 

6.1 First Passage Time Results 

Here we present the results of Kou/Wang for the first passages times of a double expo- 
nential jump diffusion process, as given in |KW03j . Consider the DEJD proces 

Nt 

Xt = aBt + fxt + J2 Yi-^ ^0 := (20) 

i=l 

where {Bt)t>o is a standard Brownian motion, {Nt)t>o is a Poisson process with intensity 
rate A, /i and a > are respectively the constant drift and the volatility of the diffusion 
part of the process. The family of the jump-sizes (Fj)jgN is independent and identically 
distributed according to a a double-exponential distribution with density fy as given in 
f|T7j) . We are interested in the Laplace transform of r;,, the random variable specified by 
the first passage time of a boundary h: 

n := infiXi > 6}, 6 > (21) 

i>0 
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The infinitesimal generator of tlie jump diffusion process (l20i) is given by 

1 f°° 
Cu{x) = -a\"{x)+fiu\x) + X I {u{x + y)-u{x))fY{y)dy (22) 



— oo 



for all twice continuously differentiable functions u{x). Further, suppose 9 e] — ri2,rii[. 
The moment generating function of the jump size Y is given by: 

from which the moment generating function of Xf can be obtained as 

ip{e,t) ■= E[exp{eXt)] = exp{G{e)t), 
where the function G{-) is defined as 

Gix) := X, + W + A f ^ + ^ - (24) 

Lemma 6.1. The equation 

G{x) = a for all a > (25) 
has exactly four roots: f3i^a, f^2,a, —Pz,a, — /34,o where 

< <r]i< (32,a < oo, < (3s^a <V2< f3i,a < oo 

Proof Cf. |KW03j (page 507, Lemma 2.1). ■ 

Theorem 6.3. For any a g]0, +oo[, leta (3i^a o-nd P2,a be the only positive roots of the 
equation 

a = G{x), 

where < (3i^a < Vi < ^2,0 < +00. Then the Laplace transform of Tb is given by: 

rjl (32,a - (3l,a Vl P2,a " 

Proof. Cf. |KWn3] (page 509, Theorem 3.1). ■ 

Again, a simple transform is needed in order to apply the result from the previous 
theorem to our process given by f|T8|) . For D > {] consider: 



rs^D = inf {5, > S + D} 

= inf jsexp {^^-^t + aBt + Yl^^ > ^ + ^| 

.g{(,_!^),,.B,.f;r.>io,(^)} 



(27) 
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So the expected utility for the ask side of the LOB (i.e. for the sellers) is given by 
substituting z for h in (l26l) . and calculating fi^.r and /32,r by solving (125!) with [i and z: 

G(A) = r, 2 €{1,2} 

A/i + - A a + A + — - - 1 

2 v^i - A ^2 + A 

where the two roots /3i and /32 are in the following intervals: 

< <rix< I32,a < oo 

Observe, that while in the GBM setting, we had direct access to a formula for the first 
hitting time regardless of whether the boundary was above, or below the starting point 
of the process. Theorem 16.31 only provides a formula for higher boundaries. So we need to 
make a distinct transform for the bid (i.e. buy) side of the LOB. Now, consider for D > 
the following: 



Ts-D = mi{St <S-D} 

t>0 



inf 



;>5i-(''-y)'-''^'-E^^^-'''i5( s 

^ V ' ^ 

A 



S-D' 



■D 

t>0 



Nt 

inf \ilt + aBt-^Yi> 

i=l 
Nt 

inf lilt + aBt + '^Yi>z 



V . 

= 1 

t>0 



i=l 



where the first transform in probability is due to the reflection principlJ^. and the second 
due to 

Y = ue-{i- u)r ^-Y = -(1 - u)r - ue m 

where Y ~ dexp(p,q,T]i,ri2), so that for Y ~ dexp{q,p,r]2,r]i) 

Y = -Y. (30) 

Obviously in the DEJD model there are significantly more parameters to be either 
estimated, or fitted. While in the GBM setting we could extract the impatience rate 
and expected utility by estimating the drift and volatility of log-returns, and imposing 
constraints such as keeping r and Uoir) constant, a similar strategy would have only 
limited success in the DEJD framework. A fundamental trade-off between keeping the 
model as flexible as possible and imposing new constraints is at hand, because the former 
carries a significant risk of overfitting, and the latter might not converge for a large class of 
pre-set parameters. Next we show how to separate the diffusive volatility from the jump- 
part contribution to total process volatility, which reduces the number of parameters 
needed to fit. 
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Cf. [Kli06] . 
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6.2 Parameter Calibration to Market Data 

The main difference in this process, compared to a diffusion, is that a point estimate of 
V[/(A)t] from observed data is an estimate for the sum of the diffusive and the jump-part 
variances, as is shown in Theorem 16. II We will therefore make use of the bipower variation 
introduced by Brandorff-Nielsen and Shephard in |BNS04] . Consider the realized variance 
over a period A^Atj^ 

N 



V / ,=1 



J-- 

where is the sampling frequency and R^j ^t{N) is the log return in the time span from 
(j — 1) At to jAt. Notice that RV^At is an estimate of the total process variance over the 
whole sampling period A^A. It can be showiJ^. that 

for m^oo RV^.AtN ( — ] V[/(A^At)], 
™ ym J 

i.e. by increasing the sampling frequency over the same interval NAt, the realized variance 
converges to the the process' total variance over the period A^At. Consider further the 
bipower variation: 

/ 1 \ ^ 

^^-A. j = _ E l4A*(iV)| \RUMN)\ , 

which is shown to converge for an increasing sampling frequency to the diffusive part of 
the total variance of the process over the whole sampling period NAt, i.e. 

for m ^ oo BV^ aw | — 1 a^NAt 



So from the log- returns /(At) we can estimate the diffusive cx^ by: 



a^At = ^XH^iR (31) 



and for the jump part we can use: 

V[/(At)] - a' At 



2 . , _ RVnm (^) - BVnm (^) 



N 

which in combination with the estimate for and with Theorem 16.11 e) leads to 

2p 2q\ _ RVNAt (^) - B VNAt (^) 



AAt(^ + g ) = (32) 



quantifying the jump-part contribution to the process' total variance. Due to sampling 
errors however, this expression may take a negative value, so in order to avoid this in our 
empirical work we cap the bipower variation by the realized variance: 

^iVA*(^) ■■= iBVr^At (^) , RVnm (^) I (33) 



^■^This presentation is based on the exposition in |AM10) with a number of alterations to better suite 
the purpose of this paper. 
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Cf. |BNS04j 
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Again, as in the GBM setting, we choose = 30 and base our rolhng estimates at t on 
observed log-returns from t — N— Itot— 1. Still, including the diffusive drift fi, we have 
a total of five parameters (as q = 1 — p) to estimate from a single constraint. For this 
reason we assume, that the process has no drift, arguing that price level changes come 
about jump-wise, and that price movement between jumps is driven only by Brownian 
motion scaled by its diffusive a. We can also add another constraint from the observed 
rolling mean of the log-returns: 

Now there are four free parameters and two constraints. Additional assumptions and 
constraints may include either, or all of the following (superscripts indicate the time of 
the data point): 

• rfy = 7]i, or (r/j — r/^)^ < e, which amounts to assuming that mean jump-sizes are 
essentially the same for up-jumps and down-jumps. The only source of asymmetry 
in the model derives from the probability p that an up-jump occurs, given that the 
price process jumps at all. 

• (A^' - X^-^f < e; 

• {rfi -rfi'^Y < e for i = 1,2. 

All of these constraints make sense, and especially the first one would be the most stringent 
and effective, as the estimation problem would then be to derive three parameters from 
two constraints. However, readily available optimization procedures would not converge 
when this constraint is introduced for reasons explained in the next paragraph. 

Another feature of the fitting is that we now need to fit the impatience rate and 
expected utility for the bid and for the ask side simultaneously. While in the GBM setting 
the process parameters for the bid and the ask side were the same and did not need fitting, 
which allowed us to solve for the impatience rate and the expected utility on each side of 
the book consecutively, in the DEJD setting we must assume that sellers and buyers have 
the same view of the underlying process, so we need to fit two impatience parameters r^^ 
and r^^'^ in parallel, while still under the assumption that either of the expected utilities 
C/^'^ and U^^ is step-wise approximately constant and evolves smoothly. This of course 
makes the fitting procedure much more difficult and potentially unstable with off-the- 
shelf optimization techniques. In fact, this is the reason why with the introduction of 
more stringent constraints from the above list, the optimization fails to converge - we 
have found that for a target function similar to the one from GBM framework minimizing 
the error for one side of the book only, an optimization algorithm with all of the above 
constraints introduced would converge. 

On one hand, fitting both sides simultaneously restrains available optimization tech- 
niques from introducing constraints. On the other hand fitting the stochastic process' 
driving parameters on two separate and independent sets of data in parallel is of and 
in itself an additional constraint, which should discipline the optimization algorithm and 
provide insurance against parameter redundancy. Since we can optimize one side of the 
book imposing all of the constraints, we suppose a more careful analysis of the parameter 
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domain for the optimization problem should lead to an algorithm which converges. This, 
however, is beyond the scope of this paper, as it is a numerically challenging problem, 
which departs from the research of price dynamics implications of the LOB. Furthermore, 
with only the last constraint from the list above, we achieve reasonable results. 

Following is a pseudo-code representation of Algorithm [2] for the estimation of the 
impatience rate in a DEJD setting. The inputs are starting utilities and impatience rates 
for both sides of the book, and a vector of process starting parameters P, which cannot 
be directly estimated and need fitting, that is A, rji, 772 and p. The outputs are two vectors 
of utilities, two vectors of impatience rates and a set of price process parameters derived 
by our fitting procedure. 

Algorithm 2 Estimation of the impatience rate in a DEJD setting 
Inputs: N, steps, U^""^, r^""^, U^''^, r^''^, s^, m 
ro := (r^^i^, r{j^<^) e 

Uo:= {Ut,U^''') 
for i=l:N do 

while error (r) > e & Tolerance > S do 
Ti := minr>o.oi {error (r)} 



s.t. constraints (E]) , ([32]) , ([33]) , (El]) & (r// - r^/ < e for i = 1, 2 
WHERE error (r) is 

calculate U^'^ according to (12]). (I27D, using P, r 
calculate Uf"^ according to ([2]), ([2H]), using P, r'^^^ 



10: error r 



11: error(r) 




0.5(f/f + U^_^^) j \ 0.5{UY"^ + U^'"^ 

/^ask ^ask \ 




12: error(r) =error(r)/4 
13: end while 
14: end for 

Outputs: U = {W^^, U^'"^) e Rl""^, r = (r^^"^, r'^^^) e M+''^, P G R^^^ 



As already noted, this problem is numerically challenging. Very sensible tuning of op- 
timization parameters concerning convergence tolerance is needed. The problem is further 
complicated, by the fact that the function G (cf. Theorem 16. 3p is so poorly conditioned, 
that finding its roots in the specified intervals (where the function is differentiable) be- 
comes very hard. For this reason we must use an efficient implementation of the very 
robust but time-consuming bisectioiJ^ algorithm. Note that in order to apply bisection 
the function G needs to be monotonic and continuous over the specified interval, and its 
values at the interval boundaries to have different signs. This is verified in Lemma 16.11 



7 Results For a Jump- Diffusion 

The double-exponential jump-diffusion (DEJD) proves to be much better suited to model 
an equilibrium in the LOB under the assumption of a constant impatience rate across 

15 Cf. |BFTn] 
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levels, given an established market regime (see Figure |8]). The induced fat-tailed distri- 
bution of log-returns makes orders further out in the book more accessible, than in a 
GBM setting. We further establish, that in order to achieve an equilibrium of the LOB, 
as reflected by the consistency of the impatience rate, the underlying process cannot be e 
GBM. 

We illustrate our findings by showing a representative day, again August 20, 2010, 
and again only showing results for the sell-side of the LOB. Figure [6] shows the expected 
utility in the DEJD setting. While still essentially flat for established market regimes, its 
choppiness is indicative of partial numerical instability, a problem outlined in the previous 
section. Nonetheless, the success of the DEJD over the GBM is clearly demonstrated in 
Figure [71 where the evolution of the impatience rate is shown. It is very stable for 
established market modes and changes spike-wise to indicate that a new market regime 
has been established. Regression analysis has however showed, that as in the GBM case, 
the impatience rate is not indicative of future price movements. It serves very much as 
an anchor which allows for consistent comparison of limit orders at different levels in the 
LOB, as long as market sentiment remains unchanged. 

Figure [8] clearly shows, that in a DEJD setting, an approximately constant impatience 
rate can be derived. It shows the impatience rate as a function of the distance-to-fill in 
the course of a whole day. Highlighted are the values for two periods when market regime 
was unchanged, and the expected utility was constant in each period. Since the implied 
impatience rates form a flat line across all distances for a given level of utility, they are 
not dependent on the limit order level. This means that r can be used as reliable and 
well-defined quantifier of the trade-off between waiting longer for a better price against 
waiting less for a worse price. 



1 1 , , , , , ^6100 




Figure 6: Expected utility (left axis) and mid-price (right axis) for the sell side of the 
book in the DEJD setting. The utility is not as smooth as in the GBM setting, but still 
a very good fit. (FDAX, August 20, 2010) 
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Figure 7: Impatience rate (left axis) and mid-price (right axis). The DEJD framework 
provides for a smooth, level-independent r, where the GBM setting fails. (FDAX, August 
20, 2010) 



8 Conclusion 

We have shown that the assumption of an equilibrium in the limit order book (LOB) is 
not consistent with the price dynamics given by a geometric Brownian motion (GBM). 
Instead, in equilibrium, the LOB implies a volatility smile that is reminiscent of, but 
unrelated to, the one known from option-pricing. This observation necessarily leads to a 
fat-tailed distribution of log-returusB The most natural explanation of this phenomenon 
is the occurrence of jump processes in the price dynamics. 

We further demonstrate how an impatience rate is implied from empirical observations, 
so that it is consistent with the assumption of market efficiency. 

There are several directions for further research of the proposed framework. A natural 
extension would be the study of different jump-diffusion models, which provide greater 
parameter stability. In addition one could investigate stochastic volatility models as an 
alternative to incorporating the volatility smile. Furthermore it would be instructive to 
compare the jump parameters presented in this paper with the results from standard 
techniques that are based on a direct analysis of the time-series. 
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^^In [Kou02] it is shown, that this is the case in the DEJD modeL 
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Figure 8: r against D. Apparently much less noisy than the GBM framework, but at first 
sight still very scattered. Notice, however the black stars (r from t =3 to t =3.75) and 
the red triangles (r from t =5.25 to t =5.75), as a function of D in two different market 
modes. In the absence of market sentiment shifts r is nearly constant. Changes in r could 
indicate shift in sentiment. It is also apparent that in contrast to the GBM setting, the 
in the DEJD model impatience rate is a function of neither the distance-to-fill, nor the 
volatility. 

A Additional Plots 
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(c) GBM Seller (d) DEJD Seller 

Figure 9: Impatience rates and mid-price. 
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Figure 10: Impatience rates against the risk-adjusted realtive distance-to-fill z = 

log(S+D)/S 
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Figure 11: Expected utilities and mid-price. 



B Proof of Theorem 16.1 



A number of preliminary results are due, in order to prove the statement. We begin by 
formulating the Ito lemma for semi-martingales, (eg. jump-diffusions), and then apply it 
to our process to obtain the solution. 

Lemma B.l (Ito's Lemma for Semi-Martingales). Let X be a semi martingale, and 
let f be real C^'^ function. Then f{X,t) is again a semi-martingale, and the following 
formula holds: 



df{Xt,t-) = ft{Xt,t-)dt + f^{Xt^,t-)dXt + -f..{Xt-,t-)d{X) 
+ A/(Xi,t-)-/,.(Xi_,t-)AXi 



(35) 



^''This formulation is essentially the one given in |Pro90) . but has been adapted to the given differential 
form using the exposition in |Bel05) , as well as |0ksO3| . The convenient notation for the derivatives of / 
has been inspired by |Shr04j . 
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with the following notations: 

UX,t) := ^{X,t), MX,t):=^{X,t), t) := |^(X, t), 

t-:=hmt-e, AXt := Xt - Xt-, 
AfiX„t-) := fiX„t-) - fiX,_,t-). 
Also d{X)1 is the differential of the quadratic variation process of the continuous part of 

Proof. A proof is given in |Pro90j . p. 78. ■ 

As a consequence the following corollary for the Doleans-Dade stochastic exponential 
for semi-martingales can be formulated: 



Corollary B.l (Stochastic Exponential for Semi-martingales). Qfl Let Xt he a semi- 
martingale, Xq = 0. Then there exists a unique semi-martingale Zf satisfying dZf = 
Zt-dXt with Zq = 1. Zf is given by: 



Z, = exp (x, - ^{X)t\ n ((1 + A^s) exp(-AX, 



(36) 



s<t 

Proof Cf. jProQOj, p.84. ■ 
Proof of Theorem \6.1\ . Consider the following function 

f{x,t) := xe^\ fx{x,t) = fxx{x,t) = 0, ft{x,t) = fixe^^ = nf{x,t). 
We look for a solution of the form; 

Xt = Cte^' 

for some semi- martingale Ct- Applying the Ito- formula for semi- martingales we obtain: 

dXt = df{Ct, t) = e^'dCt + fiCtc^^'dt 

Notice that under the assumption Xt = Ctc^^ the SDE can be rewritten as: 

dXi = ae^*((TdSi + dJt + Aidt) 

Comparing the last two equations, we conclude that: 

dCt = Ct{adBt + dJt), 

and by virtue of the previous corollary, we know that the unique solution for Ct is given 
by: 

/ Nt 1 \ ^* 

Ct = 5exp oBt + Y,{V, - 1) - -ah JJ^^^xp ( - (V. - 1)) 

\ i=l / i=l 

f 1 \ ^' 

= 5exp |^afii--(T2tj J]V^„ 



Again, this formulation has been adapted from Theorem 37 in |Pro90| . p. 84 to be in the more 
convenient differential form and in the shorter form, as given in the theorem's proof also there. 
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which yields 

Nt 



Xt = Cte'^' = exp 

concluding the proof0 

Note that f lTB]) is the same as: 



\ 

^ ^ i=i 



St = Sexp\ ifi- — )t + aBt + yYA (37) 



C Proof of Theorem 16.2 



Definition C.l (Poisson Process). A right- continuous process {Nt)t>o with state space 
No is a time homogeneous Poisson process with intensity rate A iff the following is true: 

a) Nq = a.s.; 

b) the process has stationary, independent increments, which are Poisson distributed, 
i.e. for all s,t>0 Ns+t - A/'s ~ Poi(\t). 

Definition C.2 (Compound Poisson Process). Let {Nt)t>Q be a time homogeneous Pois- 
son process with intensity rate X and {Yn)nm a family of iid random variables, and let the 
family further be independent of {Nt)t>o- The process {Ct)t>o, defined by 

Nt 



i=l 



is called a compound Poisson process. 

Lemma C.l. For a time homogeneous compound Poisson process {Ct)t>o with intensity 
rate X and square-integrable Yi the following holds: 

E[Ct] = XtE[Yi] and Y[Ct] = XtE [Y^] . 



Proof Cf. |MSn5] . Theorem 10.24, iii). ■ 

We will also need a slight variation of this Lemma concerning the moments of a 
"Poisson product" of iid random variables, which we will prove: 

Lemma C.2. Let {Nt)t>o be a time homogeneous Poisson process with intensity rate X 
and let t > be fixed. Further let {Yi)ifz-^ be a family of iid random variables, which is 
also independent of the Poisson process. Then for the 'Poisson product": 

Nt 

Pt=llK (38) 

i=l 

^^An intuitive proof of this with omission of technicahties is given in [Pol06j . 
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the following holds: 



and 



E[Pt]=exp{t\iE[Vi]-l)) 



Y[Pt] = exp {tXiE[V,^] - 1)) - exp (2tA(E[Fi] - 1)) 



(39) 



(40) 



Proof. We employ the iid property of the family (Vi)jgN, and its independence of N. We 
also make use of the density function of a Poisson random variable and a representation 
of the exponential function: 



E[Pt] = E 



.4 = 1 

OO 



k=0 



k\ 



-tx 



k=0 



E[V^]tXY 



g-iAgE[yi]fA 



fc=0 

exp(tA(E[Vi]-l)) 



For the variance we will need the following: 

2' 



E[P^^] = E 



Nt 



vi=l 



E 



Nt 



i=l 



k=0 



exp {tXiE[V,'] - 1)) 



so the variance is 



¥[Pt] = E[P2] _ E[Pi]2 = exp {tX{E[V^^] - 1)) - exp (2tA(E[\/i] - 1)) 



Proof of Theorem 1 6'. ^ . a) First we show how the first two moments of a double- 
exponential random variable are calculated: 

E[Y] = E[ue - {1 - u)r] = - ~ 

Vl V2 



For the variance we have: 



Y[Y] = E[{ueY] - 2E[t/(i - u)en + - u)c 

^ 2p 2g _ p2 j)q^ ^ ^ 
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m m m mm m 
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P_^q_ 



ril 



mm 
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b) Now we turn to the first two moments oiV = exp(y) from Theorem 16 .![ 
E[V] = E[exp(F)] = E[exp(t/^+ - (1 - U)C)] 

= pE[exp(^+)] + gE[exp(-r)] = + Q- 



?7i - 1 ?72 + 1 
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where in the last step we used the moment generating function of the exponential 
distribution (cf. |MS05] . p. 102). Note that for the first moment of V to to be 
finite, 771 > 1 is required, and for E[V^] < 00, ?7i > 2 is needed (cf. |MS05j . p. 103). 
Note that we have explicitly imposed those constraints in our model. Now we turn 
to the variance. First we obtain: 



E[V^] = E[exp(2F)] = E[exp([/2^+ - (1 - U)2C)] 



pE[exp(2^+)] + gE[exp(-2^-)] = p- 



Vi 



+ q 



V2 



V2 + 2' 



and again, the last step was produced by using the moment generating function of 
an exponential distribution. So, in summary, the variance of V is given by: 



Y[V] = E[V^]~E[Vf 



P- 



+ q 



V2 



V2 + 2 



p T + q — - 

?7i - 1 r]2 + 



c) This has been considered in the previous lemma. 

d) Next we turn to the moments of the stochastic process itself: 



E[St] = E 



Sexp\^{^^i-—jt + aBtj l[V^ 



SE 



exp ( ( - Y ) t + aBt 



E 



N{t) 



S exp{fj.t)E[Pt] 



where Pt is the "Poisson product" from (l38l) . and the term e'^* was obtained as the 
first moment of a log-normally distributed random variable, which corresponds to 
distribution of the GBM process in time. Next, we look at the second moment of 
the DEJD: 



E[S^] = E 



exp {^2l^^-^jt + 2aB,j J] V,' 

= exp{2fxt + aH)E[P^], 
so in summary the variance is given by: 

Y[St] = E[S^] - E[St]^ = S^e^^"' fe'^'*E[P,2] - E[PtY 



e) Finally, after characterizing the process itself, we take a look at the log-returns 
l{At), which are of particular importance for our empirical work, as we calibrate 
our model parameters, so as to match the rolling volatility and mean of the observed 
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log-returns in the market: 



E[/(A)] = E 



log 



^Sexp(^(^fi- {t + At) + aBt+At + ^ 



E 



iVAt 



cr 



- y ) At + AAtE[r] 



/i 



cr 



At + AAt 



P__ q_ 

Vi V2 



where the penultimate step was achieved by observing that Xlil^* Yi is a compound 
Poisson process and together with Lemma IC.ll 

We obtain the variance as the sum of two independent random variables, namely a 
scaled Brownian motion with drift and a compound Poisson process: 



V [/(At)] = V 



NAt 



y ) At + aBM + Y.'^' 



a 



V 

(T^At + AAt 



^ - y ) At + aBAt 



V 



i=l 



2p 2q 

2 ' 1 
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D Descriptive Statistics 
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Figure 12: Descriptive statistics per day: r is the mean impatience rate, cr^ is its standard deviation, U is the expected utihty and au is 
its standard deviation. Notice that cr^ is much more lower for the DEJD. 
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Figure 13: Continued - descriptive statistics per day: r is the mean impatience rate, ar is its standard deviation, U is the expected utihty 
and (7(7 is its standard deviation. Notice that ar is much more lower for the DEJD. 
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The limit order book at a point in time 
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Figure 14: A stylized snapshot of the LOB with three levels of information at an arbitrary 
point in time. The mid-price is 3.75, and is the mid-point between the highest bid and 
the lowest ask offer, which are in this case 3.50 and 4.00 respectively. The block sizes 
represent the total number of orders at each level in the book. Block sizes below the 
abscissa represent bid-order sizes, while above - ask-order sizes. 



The limit order book (LOB) is the collection of buy and sell orders at any point in 
time. We will explain how it works from the perspective of a buyer. Consider the example 
of a stylized LOB given in Figure [TH There are six buy orders at a price of 2.50, two buy 
orders at a price of 3.00, and three orders to buy at 3.50. Note that we do not know if at 
each level, the order sizes represent order submission by a single participant, or are simply 
aggregated by the exchange by their limit price. For our purposes it is not important who 
exactly places which orders at which level, so we can safely assume that all orders at each 
level are placed by a single trader. Currently the best buy order is the one at 3.50 for 
three shares. Assume a new buyer wants to enter the market. They could place a limit 
order at 3.50 or less, but will have to wait before the current orders at 3.50 are matched 
by a seller, or are cancelled by the respective buyer. They could alternatively place a limit 
order between 3.50 and 4.00, thus narrowing the bid-ask spread, but declaring that they 
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are willing to pay more than the current best bid offer. This would move the indicative 
mid-price up. Or, lastly, they could place a market order. Assume they place a market 
order for four shares. In this scenario the two limit sell orders at four, and the two limit 
sell orders at 4.50 will be executed, and the order be fully filled. The outermost ask level 
at 5.00 will become the best ask offer, and the mid-price will move from 3.75 to 4.25. At 
last, consider the following strategy for our hypothetical buyer, who wants to buy four 
shares at a price of four. They could place a market order for two shares, which will 
immediately be matched by the best ask price at 4.00, and also place a limit buy order 
at 4.00 for another two shares, thus becoming the best bidder at 4.00. Notice also that in 
the last scenario, the limit buy order at 2.50 will become invisible to a market participant 
who only has access to the best three offers on each side of the mid-price. In exchange, 
potentially a new limit sell order at above 5.00 will be illuminated as the best sell order 
got filled and the participant is entitled to only see the best three sell orders. 

F Empirical Fitting 

In this section we first give the reader an idea of what the raw LOB data looks like, 
what particular type of data we have had at our avail, and what is an efficient way to 
organize it for further analysis. Next, we will present our method of calibration of the 
model parameters to market data and finally we describe the specificities of our fitting 
procedure, with which the impatience rate is estimated. 

An excerpt of raw LOB data for August 20, 2010 is given in Figure fT5l 
The data consists of a time stamp in milliseconds, the level at which a change in the 
LOB has occurred, followed by the updated bid limit price, ask limit price, bid order size 
and ask order size. Notice that the data simply gives an update at each level if something 
changes and is in itself not the LOB. Much rather it is our task to reconstruct from this 
data set the actual snapshot of the LOB at each point in time. In the example of Figure 
[T31 the first three orders timestamped 13:59:32:367 update the order at levels three, four 
and zero. Notice that level zero is designated to be the level at which the best bid and 
the best ask orders are given. Thus, the spread at the beginning of this excerpt is 0.5 - 
the difference between 6023.5 and 6024 - which is also the tick-size, i.e. the minimal price 
increment, of the DAX future. Then, at time 13:59:32:383 updates to the levels one and 
two are seen (previous values not given in this excerpt), and an update to the ask size in 
level three - it has gone down from 32 to 30. Of particular interest is the event of order 
execution. In our example we see an order being executed at level zero at 13:59:32:397, 
which is indicated by a price matching of, in this case, 6024, which was the previous best 
ask offer. The transaction size is the lesser of the two order sizes at that level, which is one 
in the given example. We don't know if it is the agent with the previous best bid offer of 
6023.5 who has cancelled all or part of their orders and placed a market order, or a limit 
buy order at 6024, which was immediately executed. What would seem more plausible 
in this case is that a new agent has come to the market, placing a market order for one 
futures contract, as we can see in the follow-up update at 13:59:32:413 that there are now 
even more orders at 6023.5, which are now on level one (previously zero) on the bid side, 
i.e. this price is no longer the best bid offer. Notice also how in the time immediately 
after the order execution, an automatic update of levels takes place on the ask side - what 
was previously at level three, is now at level two, what was previously at level four, is 
now at level three and so forth. It is unfortunate that our data does not contain order 
flags, which explain exactly what has happened in each time-step, i.e. containing explicit 
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Figure 15: Raw data of the nearest DAX future (FDAX) of August 20, 2010 with a depth 
of six levels - zero to five - on each side of the book. 

information whether an order has been executed, or more interestingly since it cannot 
be reliably reconstructed from this data, if an order has been cancelled. However, using 
interpretation techniques as demonstrated in the previous paragraph, one can reconstruct 
the LOB at each time step and have a complete picture of what the LOB looks like at 
each point in time. This is also the way we organize our data for further analysis. In 
order to have a better understanding of how exactly we have reconstructed the LOB from 
the raw data, please see Appendix C, where the data from Figure [15] has been composed 
in a more convenient way for further analysis. 
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G Formatted LOB Snapshots 
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Figure 16: Consecutive LOB state snapshots, based on the example of Figure [T5l FDAX 20 August, 2010. T is the time-stamp, BP and 
AP are respectively the bid and ask limit order prices, BS and AS are respectively the bid and ask sizes at the given level. 

This is an example of how LOB is organized for further use. It is based on data from Figure [I5] and shows the state of the LOB at 
each point in time, whenever the LOB has been updated. It is assumed that the LOB was blank before the first time-stamp from the 
excerpt in Figure [151 What follows is simply a horizontal collection of all orders with the same time-stamp. If nothing has changed at a 
given level, then simply the state from the previous time step is copied. A cancellation would not be explicitly given, but would result in 
orders from higher levels in the LOB taking up the place of the cancelled order. If there are no orders at higher levels, then the cancelled 
order would be filled with zeros. Notice that several updates per time step are possible, including changes of the level of existing orders, 
incoming new orders, cancellation of previous orders, changes in the price or the order size of orders at a given level, and also execution 
of orders by matching, or submission of a market order by a new market participant. Observe the order execution at 13:59:32:397 and 
how in the next period the exchange does not quote a next best offer. Also look at how the orders from upper ask levels assume lower 
levels immediately after the execution of what appears to have been a market order at 13:59:32:397, as now the previous best bid offer of 
6023.5 reappears at level one, instead of level zero as the second best bid offer. 
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